80 research outputs found
Geodesic distances in the intrinsic dimensionality estimation using packing numbers
Dimensionality reduction is a very important tool in data mining. An intrinsic dimensionality of a data set is a key parameter in many dimensionality reduction algorithms. When the intrinsic dimensionality of a data set is known, it is possible to reduce the dimensionality of the data without losing much information. To this end, it is reasonable to find out the intrinsic dimensionality of the data. In this paper, one of the global estimators of intrinsic dimensionality, the packing numbers estimator (PNE), is explored experimentally. We propose the modification of the PNE method that uses geodesic distances in order to improve the estimates of the intrinsic dimensionality by the PNE method
Novel Machine learning approach for Self-Aware prediction based on the Contextual reasoning
Machine learning is compelling in solving various applied problems. Nevertheless, machine learning methods lack the contextual reasoning capabilities and cannot be fitted to utilize additional information about circumstances, environments, backgrounds, etc. Such information provides essential knowledge about possible reasons for particular actions. This knowledge could not be processed directly by either machine learning methods. This paper presents the context-aware machine learning approach for actor behavior contextual reasoning analysis and context-based prediction for threat assessment. Moreover, the proposed approach uses context-aware prediction to tackle the interaction between actors. An idea of the technique lies in the cooperative use of two classification methods when one way predicts an actor’s behavior. The second method discloses such predicted action (behavior) that is non-typical or unusual. Such integration of two-method allows the actor to make the self-awareness threat assessment based on relations between different actors where some multidimensional numerical data define the connections. This approach predicts the possible further situation and makes its threat assessment without any waiting for future actions. The suggested approach is based on the Decision Tree and Support Vector Method algorithm. Due to the complexity of context, marine traffic data was chosen to demonstrate the proposed approach capability. This technique could deal with the end-to-end approach for safe vessel navigation in maritime traffic with considerable ship congestion
Visual decisions in the analysis of customers online shopping behavior
The analysis of the online customer shopping behavior is an important task nowadays, which allows maximizing the efficiency of advertising campaigns and increasing the return of investment for advertisers. The analysis results of online customer shopping behavior are usually reviewed and understood by a non-technical person; therefore the results must be displayed in the easiest possible way. The online shopping data is multidimensional and consists of both numerical and categorical data. In this paper, an approach has been proposed for the visual analysis of the online shopping data and their relevance. It integrates several multidimensional data visualization methods of different nature. The results of the visual analysis of numerical data are combined with the categorical data values. Based on the visualization results, the decisions on the advertising campaign could be taken in order to increase the return of investment and attract more customers to buy in the online e-shop
Geodesic distances in the maximum likelihood estimator of intrinsic dimensionality
While analyzing multidimensional data, we often have to reduce their dimensionality so that to preserve as much information on the analyzed data set as possible. To this end, it is reasonable to find out the intrinsic dimensionality of the data. In this paper, two techniques for the intrinsic dimensionality are analyzed and compared, i.e., the maximum likelihood estimator (MLE) and ISOMAP method. We also propose the way how to get good estimates of the intrinsic dimensionality by the MLE method
Tikimybinis dažnų posekių paieškos algoritmas
Dažnų posekių paieška didelėse duomenų bazėse yra svarbi biologinių, klimato, fi nansinių ir daugelio kitų duomenų bazių analizei. Tikslieji algoritmai, skirti dažnų posekių paieškai, daug kartų perrenka visą duomenų bazę. Jeigu duomenų bazė didelė, tai dažnų posekių paieška yra lėta arba reikalingi superkompiuteriai. Straipsnyje pasiūlytas naujas tikimybinis dažnų posekių paieškos algoritmas, kuris analizuoja tam tikru būdu sudarytą pradinės duomenų bazės atsitiktinę imtį. Remiantis šia analizedaromos statistinės išvados apie dažnus posekius pradinėje duomenų bazėje. Šis algoritmas nėra tikslus, tačiau veikia daug greičiau negu tikslieji algoritmai ir tinka žvalgomajai statistinei analizei. Tikimybinio algoritmo klaidų tikimybės įvertinamos statistiniais metodais. Tikimybinis algoritmas gali būti derinamas su tiksliaisiais dažnų posekių paieškos algoritmais. Jį galima taikyti ir bendrajam struktūrų paieškos uždaviniui.Probabilistic Algorithm for Mining Frequent SequencesJulija Pragarauskaitė, Gintautas Dzemyda
SummaryFrequent sequence mining in large volume databases is important in many areas, e.g., biological, climate, fi nancial databases. Exact frequent sequence mining algorithms usually read the whole database many times, and if the database is large enough, then frequent sequence mining is very long or requires supercomputers. A new probabilistic algorithm for mining frequent sequences is proposed. It analyzes a random sample of the initial database. The algorithm makes decisions about the initial database according to the random sample analysis results and performs much faster than the exact mining algorithms. The probability of errors made by the probabilistic algorithm is estimated using statistical methods. The algorithm can be used together with the exact frequent sequence mining algorithms
Specialios struktūros daugiasluoksnis perceptronas daugiamačiams duomenims vizualizuoti
Pasiūlytas ir ištirtas radialinių bazinių funkcijų ir daugiasluoksnio perceptrono junginys daugiamačiams duomenis vizualizuoti. Siūlomas vizualizavimo būdas apima daugiamačių duomenų matmenų mažinimą naudojant radialines bazines funkcijas, daugiamačių duomenų suskirstymą į klasterius, klasterį charakterizuojančių skaitinių reikšmių nustatymą ir daugiamačių duomenų vizualizavimą dirbtinio neuroninio tinklo paskutiniame paslėptajame sluoksnyje.Special Multilayer Perceptron for Multidimensional Data VisualizationLaura Ringienė, Gintautas Dzemyda
SummaryIn this paper a special feed forward neural network, consisting of the radial basis function layer and a multilayer perceptron is presented. The multilayer perceptron has been proposed and investigated for multidimensional data visualization. The roposedvisualization approach includes data clustering, determining the parameters of the radial basis function and forming the data set to train the multilayer perceptron. The outputs of the last hidden layer are assigned as coordinates of the visualized points
Minimization of the mapping error using coordinate descent
Visualization harnesses the perceptual capabilities of humans to provide the visual insight into data. Structure
preserving projection methods can be used for multidimensional data visualization. The goal of this paper is to
suggest and examine the projection error minimization strategies that would allow getting a better and less
distorted projection. The classic algorithm for Sammon’s projection and two new its modifications are examined.
All the algorithms are oriented to minimize the projection error because even a slight reduction in the projection
error changes the distribution of points on a plane essentially. The conclusions are made on the results of
experiments on artificial and real data sets
Konferencijos „Lietuvos magistrantų informatikos ir IT tyrimai“ darbai
The conference "Lithuanian MSc Research in Informatics and ICT" is a venue to present research of Lithuanian MSc theses in informatics and ICT. The aim of the event is to raise skills of MSc and other students, familiarize themselves with the research of other students, encourage their interest in scientific activities. Students from Kaunas University of Technology and Vilnius University will give their presentations at the conference
Konferencijos „Lietuvos magistrantų informatikos ir IT tyrimai“ darbai
The conference "Lithuanian MSc Research in Informatics and ICT" is a venue to present research of Lithuanian MSc theses in informatics and ICT. The aim of the event is to raise skills of MSc and other students, familiarize themselves with the research of other students, encourage their interest in scientific activities. Students from Kaunas University of Technology, Vilnius University, and Vytautas Magnus University will give their presentations at the conference
Rekomendacinės sistemos algoritmų veikimo elektroninio knygyno duomenų bazėje analizė
Straipsnis skiriamas rekomendacinių sistemų algoritmų veikimo konkrečioje elektroninės parduotuvės duomenų bazėje analizei. Analizės tikslas – pagal pasirinktus įverčius rasti rekomendacinių sistemų algoritmus, efektyviausiai veikiančius turimoje duomenų bazėje. Šiame straipsnyje palyginti nemokamos rekomendacinių sistemų programinės įrangos paketai, aprašytas su pasirinkta programine įranga atliktas rekomendavimo algoritmų efektyvumo turimoje duomenų bazėje eksperimentinistyrimas siekiant nustatyti geriausiai ir prasčiausiai veikiančius algoritmus.Analysis of the effi ciency of recommendatory systems algorithms in an e-bookshop Aurimas Rapečka, Virginijus Marcinkevičius, Gintautas Dzemyda
SummaryIn the paper, the effi ciency of various recommendatory systems algorithms in a data set of the local ebookshop is analysed. The key goal of analysis is to determine effective and not effective algorithms in the data set used for analysis. An analytical review of free or open source software of ecommendatory systems is presented. Some comparison criteria are selected. According to the criteria, a comparative analysis of the popular software of ecommendatory systems is made and some experiments with the best evaluated software are done. We have determined here which algorithms are effective in the data set, used for the experiments. 11pt; line-height: 115%; font-family: Calibri, sans-serif;"> 
- …